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Abstract 


>  Let^ . .  .TT^be  normal  populations  with  unknown  means  and  a 
common  known  variance.  The  goal  is  to  find  the  population  with  the 
largest  mean.  Two-stage  procedures  with  screening  at  the  first  stage 
are  studied  in  a  Bayesian  approach.  They  are  based  on  k  samples  of 
common  size  n,  drawn  at  Stage  1,  and  on  samples  of  common  size  n9 
drawn  at  Stage  2  from  all  those  populations  which  have  not  been  screened 
out  at  Stage  1.  If  only  one  population  is  selected  at  Stage  1,  the 
procedure  stops  at  Stage  1. 

Under  the  assumption  of  a  specific  loss  function  which  includes 
costs  of  sampling,  a  Bayes  procedure  is  derived  with  respect  to  i.i.d. 
normal  priors.  Its  properties  are  discussed  and  several  approximations 
are  considered.  The  expected  value  of  the  maximum  of  k  independent 
normals  with  known  but  distinct  means  and  a  common  known  variance  plays 
a  crucial  rule  in  the  determination  of  the  Bayes  procedure. 


*This  research  was  supported  by  the  Office  of  Naval  Research  Contract 
N00014-75-C-0455  at  Purdue  University.  Reproduction  In  whole  or  In 
part  is  permitted  for  any  purpose  of  the  United  States  Government. 
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1 .  Introduction 

Let  be  k  given  normal  populations  with  unknown  means 

2 

e-j , . . . . ek  €  IR  and  a  common  known  variance  o  >0.  For  finding  the  popula¬ 
tion  with  the  largest  mean,  two-stage  procedures  with  screening  (elimination) 
at  the  first  stage  are  studied  in  a  decision- theoretic  framework.  The 
procedures  are  based  on  k  samples  of  common  size  n^  drawn  at  Stage  1,  and 
on  a  random  number  of  samples  of  common  size  n£  drawn  at  Stage  2  from  all 
those  populations  which  have  been  selected  (not  eliminated)  at  Stage  1. 

If  only  one  single  population  is  selected  at  Stage  1,  Stage  2  will  not  be 
entered.  In  particular,  the  stopping  rule  is  thus  determined  by  the  size 
of  the  selected  subset  at  Stage  1. 

Let  X  *  (Xj ,. . .  ,X|()  and  Y  =  (Y^,...,Yk)  denote  the  vectors  of  sample 

means  (which  are  sufficient  statistics)  at  Stages  1  and  2,  respectively,  and 

let  Z  =  (n-jX  +  n2Y)/(n-j+n2)  denote  the  vector  of  the  k  overall  means. 

Although  not  all  of  the  y.-'s  and  's  are  actually  always  observed,  it  will 

prove  to  be  convenient  to  consider  Y  and  Z  in  the  derivations  to  come.  Also 

2  2 

for  notational  convenience,  let  p  =  o  /n^  and  q  =  a 

Due  to  the  complexity  of  the  problem,  optimality  results  on  elimination 

type  multi-stage  procedures  are  rather  scarce  in  the  literature.  For  an 

overview  and  references,  see  Gupta  and  Panchapakesan  (1979)  and  Miescke 

(1982).  On  the  other  hand,  such  procedures  are  highly  desirable  for 

*This  research  was  supported  by  the  Office  of  Naval  Research  Contract 
N00014-75-C-0455  at  Purdue  University.  Reproduction  In  whole  or  in 
part  is  permitted  for  any  purpose  of  the  United  States  Government. 
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experimenters  because  of  their  economical  use  of  observations.  An  intuitively 
appealing  procedure  proposed  and  studied  by  Tamhane  and  Bechhofer  (1979), 
which  employs  Gupta's  maximum  means  subset  selection  procedure  first  and 
then  the  natural  terminal  decision,  deserves  to  be  revisited  from  the 
optimality  point  of  view.  Even  though  Gupta's  rule  has  been  shown  by  many 
authors  to  perform  well  as  a  single-stage  procedure,  its  performance  has  not 
been  studied  in  the  multi-stage  context  with  respect  to  optimality.  This 
then  was  one  of  our  motivations  to  prepare  the  present  paper  to  find  an 
answer  to  the  interesting  question:  What  type  of  subset  selection  rules 
are  used  in  optimal  two-stage  selection  procedures?  In  a  first  step 
towards  an  answer,  we  shall  derive  a  Bayes  solution  for  i.i.d.  normal 
priors  under  a  specific  loss  function  which  takes  into  account  costs  of 
sampling. 

Assumption  (P): 

We  restrict  ourselves  to  procedures  of  the  following  type:  At 
Stage  1,  after  X  has  been  observed,  a  non-empty  subset  s(X)  of  U,...,k) 
of  random  size  Is  selected  where,  obviously,  i  is  associated  with 
i  =  1 ,. . . ,k.  If  its  size  |s(X)|,  say,  is  equal  to  one,  then  the  procedure 
stops  and  selects  the  correspondi ng  population.  Otherwise,  for  each 
i  €  s( X) ,  Yj  is  observed  and  then  a  final  selection  is  made  from  s ( X)  based 
on  X  and  Y.,  1  €  s(X).  Furthermore  It  is  assumed  that  the  procedures  are 
permutation  invariant. 

The  restriction  of  the  final  selections  to  populations  ,  with 
i  €  s(X)  is  actually  crucial  for  the  feasibility  of  a  solution  to  the 
given  problem.  Under  a  fairly  general  loss  structure  which  Is  permutation 
invariant  and  which  favors, at  all  stages,  selections  of  populations  with 
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large  means, Gupta  and  Miescke  (1982,  1983)  have  derived  two  optimality 
results  which  can  be  stated  in  the  present  context  as  follows. 

Fact  1:  The  natural  final  decision  at  Stage  2,  wh(ich  selects  the  population 
associated  with  the  largest  Z^,  i  €  s(X),  is  optimum  in  terms  of  the  risk 

Is 

function,  uniformly  in  e  =  (e^,...,e^)  €  K  •  Due  to  the  restriction 
mentioned  above,  this  remains  true  even  if  the  complete  vector  Y  had  been 
observed.  Thus,  for  convenience,  we  can  assume  in  the  sequel  without  loss 
of  generality  that  all  observations  Y  are  taken  at  Stage  2,  provided  the 
procedure  has  not  stopped  already  at  Stage  1. 

Actually,  the  above  result  holds  for  all  exponential  families  whereas 
the  next  result  has  been  proved  only  for  strongly  unimodal  (log  concave) 
exponential  families.  However,  the  underlying  distributions  of  this  paper 
are  clearly  of  the  latter  type  and,  therefore,  both  results  can  be 
applied  in  the  present  setting. 

Fact  2:  The  class  &  of  two-stage  procedures  which  at  Stage  1  make  subset 

selections  in  terms  of  the  largest  X^'s  and  which  employ  the  natural 

final  decision  at  Stage  2  constitutes  an  essentially  complete  class. 

The  only  characteristic  with  respect  to  which  members  of  &  differ 

from  each  other  is  |s(X)|,  i.e.  the  decision  of  how  many  populations  to 

select  at  Stage  1  based  on  the  observations  X.  Apparently,  optimality 

of  a  particular  subset  size  decision  is  now  closely  related  to  the  choice 

of  a  specific  loss  structure.  But  even  then  after  such  a  choice  has  been 

made,  no  procedure  can  be  expected  to  be  optimum  with  respect  to  the  risk, 

1/ 

uniformly  in  e  6  1R  .  Therefore  in  a  first  approach  we  shall  study  the 
Bayes  solution  with  respect  to  i.i.d.  normal  priors  under  the  following 
loss  structure. 


Assumption  (L):  Let  g:  IR  -*•  1R  be  a  fixed  function.  If  the  procedure 
stops  at  Stage  1  and  selects  i  6  {l,...,k}  then  the  loss  function  is 
given  by 

(1)  L-,(e,i)  =  c1n1  +  g( •  e  €  Fk,  i  €  {l,...,k}. 

If  at  Stage  1  the  procedure  selects  sc  {l,...,k}  with  | s |  _>  2  and  then, 
at  Stage  2,  makes  a  final  decision  in  favor  of  i  g  k}  then  the  loss 

function  is  given  by 

(2)  L2(e,s,i)  =  c1n1  +  c2n2|s|  +  g(e)-ei ,  e  €  Fk. 

Under  the  loss  assumptions  made  above,  c-j n^  and  c2n2|s|  represent  the 
respective  costs  of  sampling  at  the  two  stages  whereas  e.-g(e)  can  be 
considered  as  a  measure  of  the  quality  of  the  finally  selected  population. 

A  reasonable  choice  for  g  for  example  is  g(e)  =  max{e.j ,. . . ,6^},  e  6  IR  . 
Obviously,  the  Bayes  procedure  cannot  depend  on  c^  since  X  has  to  be  observed 
at  Stage  1  anyway.  Less  obviously,  it  will  turn  out  that  it  also  does  not 
depend  on  the  choice  of  the  function  g.  In  Section  2,  we  shall  derive  the 
Bayes  solution  explicitly  and  finally,  in  Section  3,  approximations  to 
this  solution  will  be  considered. 


2.  The  Bayes  Procedure 

From  now  on  let  us  assume  that  the  unknown  population  means  are 

random  variables  0, . 0^,  say,  which  are  independently  and  identically 

distributed  with  a  common  known  mean  eQ  €  IR  and  a  common  known  variance 
r  >  0.  It  can  be  anticipated  that  the  Bayes  rule  does  not  depend  on  eg 
since  we  are  considering  a  location  parameter  model.  As  pointed  out 
before,  all  components  of  the  Bayes  rule  are  already  known  except  the 
decision  of  how  many  populations  to  be  selected  at  Stage  1.  Let  d$, 
sc  »k),  |s|  ^  2,  denote  the  natural  final  decision  at  Stage  2, 
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i.e.  d  (Z)  =  i  if  Z.  =  max  Z.  and  i  €  s.  Here  and  in  the  following,  the 

j€s  J 

case  of  ties  can  be  ignored  since  it  occurs  only  with  probability  zero. 
Moreoever,  let  st  denote  the  natural  subset  selection  with  fixed  size 
tg  k}  at  Stage  1,  i.e.  the  selection  of  populations  associated 

with  the  t  largest  X^'s.  Working  backwards  from  Stage  2  to  Stage  1,  the 
Bayes  rule  can  now  be  determined  by  comparisons  of  posterior  expected 
losses. 

At  Stage  2,  given  X  =  x  and  Y  =  y  (or  Z  =  z),  respectively,  for  a 
natural  subset  selection  rule  st,  t  e  {2,...,k},  the  posterior  expected 
loss  is  the  following. 

(3)  E{L2(q,  st(x),ds^(x)(z)) |Z  =  z,  X  =  x} 

=  c]n1  +  c2n2t  +  E{g(e)-0jjZ  =  z) 

where  jn  is  determined  by  z.  =  max{z.|j  €  sJx)}.  Since  at  Stage  2, 

u  Jg  J  t  " 

Z  is  a  sufficient  statistic  for  e,  the  conditional  distribution  of  0, 
given  Z  and  X,  depends  only  on  Z.  This  fact  will  also  be  utilized  in 
(5)  below. 

Therefore,  at  Stage  1,  given  X  =  x,  the  posterior  expected  loss  for 
a  natural  subset  selection  rule  st,  t  €  {2,...,k}  is  given  by 

(4)  E{L2(0,  st(x),d  (?)(Z))|X  =  x) 

=  c^n1  +  c2n2t  +  E{g( ©) |X  =  x}  -  E{0j^ | X  =  x} 
where  jn,  now  being  a  random  index,  is  determined  by  Z.  =  max{Z. |j  6  s^Cx)}. 

0  Jq  J  v  • 

At  this  point,  the  semigroup  property  of  the  normal  distribution  has  to  be 

utilized  to  evaluate  E{0<  |X  *  x}. 

J0  ‘  " 
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(5)  E{0.  | X  =  x}  =  E { E { o .  J Z> 1 X  =  x} 

Jo  '  J0  "  ■ 

=  E{  max  {(pqen+qrx.+prY .)/(pq+qr+pr) } jX  =  x} 
jest(x)  J  J 

=  p(p+r)_1en  +  E(  max  {ax.  +  bN . } ) 

0  j€st(x)  J  J 

_  1  1 

where  a  =  r(p+r)  ,  b  =  pr((p+r)(pq+pr+qr))_i! ,  and  where  are 

auxiliary  i.i.d.  standard  normally  distributed  random  variables  which  will 
be  used  throughout  the  sequel . 

On  the  other  hand,  the  posterior  expected  loss  for  the  natural  subset 
selection  rule  s^  is  given  by 

(6)  EU^O,  1Q)|X  •  x> 

=  c,n,  +  E(g(0) |X  =  x)  -  E(G.  |X  =  x), 
ii  ~  '0  " 

where  in  is  determined  by  x.  =  max{x. |i  =  l,...,k}.  By  a  similar 

U  1 

argument  as  before,  it  can  be  seen  that 

(7)  E{o.  | X  =  x >  =  p(p+r)"  tn  +  a  max  lx. 1. 

’0  “  ‘  u  i  =  l . k  1 

At  Stage  1,  given  X  =  x,  the  Bayes  procedures  decides  in  favor  of 
the  subset  s^(x),  i  =  l,...,k,  if  the  associated  posterior  expected  loss 
is  the  smallest  of  those  given  in  (4)  and  (6),  respecti vely.  To  simplify 
its  representation,  let  us  assume  from  now  on  that  x-j  <  X2  <...<  x^. 

This  can  be  done  without  loss  of  generality  since  the  problem  under  con¬ 
sideration  is  permutation  invariant.  The  Bayes  procedure  can  now  be 
described  as  follows.  For  notational  convenience,  let 

(8)  et(x)  =  E(  max  (a(x.-x.)  +  bN.}),  t  =  l,...,k, 

z  ‘  j>k-t+l  J  K  J 

and  let  t*  6  {2,...,k}  be  determined  by 
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(9) 


et*(x) 


c2n2t* 


max  {e+UJ-c-rut}. 
t=2,. . . ,k  z  "  c 


Theorem  1.  If_  et*(x)-c2n2t*  <  0,  then  the  Bayes  procedure  stops  at 
Stage  1  and  selects  the  population  which  is  associated  with  xk .  Otherwise, 
the  Bayes  procedure  selects  the  t*  populations  which  are  associated  with 

xk-t*+l’**”xk* 

From  an  applied  point  of  view,  it  can  be  seen  readily  that  the  Bayes 
procedure  can  be  used  without  too  much  computational  effort.  The  functions 
et(x),  t  =  2,...,k,  are  simply  expectations  of  extreme  independent  normals 
with  given  means  and  a  common  known  variance.  They  can  be  determined 
either  via  simulations  or,  more  precisely,  numerically  since  they  are 
one-dimensional  integrals  given  below  in  (17).  Several  useful  approximations 
will  be  derived  in  the  next  section.  These  are  considered  not  only  to 
simplify  the  application  but  also  to  gain  further  insight  into  the 
structure  of  the  Bayes  procedure.  In  the  remainder  of  this  section,  we 
shall  derive  some  basic  results  which  will  prove  to  be  useful  for  these 
considerations. 

First  we  point  out  that  functions  of  the  type  c^( x)  have  been 
considered  already  previously  by  Dunnett  (1960),  Chernoff  and  Yahav  (1977) 
and  Miescke  (1979). 

(10)  T(c)  =  /5*(n)dn  «  cpU)  +  e*U),  £  €  F  , 

-oo 

where  <p  and  *  denote  the  standard  normal  density  and  cumulative  distribution 
function,  respectively.  Then  it  can  be  shown  that 

(11)  e2(x)  =  ?bT(-2'«b'1a(xk-xk_1)). 

Therefore,  if  k  *  2  or  if  k  >  3  and  the  experimenter  is  not  willing  to 
select  more  than  two  populations  at  Stage  1,  then  the  optimum  procedure 
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is  of  the  form  given  below.  Let  T  '  denote  the  inverse  function  to  T. 
Then  the  procedure  is  as  follows: 

"Stop  at  Stage  1  and  select  nk  if 

(12)  x)<_1  <  xk  +  22'ba"1T'7(2“b'1c2n2). 

Otherwise,  select  uk_j  and  *k,  proceed  to  Stage  2,  and  make  the  final 
decision  in  terms  of  the  larger  of  the  two  populations'  overall  sample 
means . " 

Thus  in  the  case  of  k  =  2,  the  screening  rule  of  our  Bayes  procedure 
at  Stage  1  turns  out  to  be  of  the  form  of  Gupta's  single  stage  subset 
selection  procedure,  and  therefore  we  can  state  that  in  this  case  the 
two-stage  procedure  proposed  by  Tamhane  and  Bechhofer  (1979)  turns  out 
to  be  a  proper  Bayes  rule  with  respect  to  the  loss  function  (1)  and  (2) 
for  appropriately  chosen  r  and  c2. 

Though  in  general  Gupta's  rule  cannot  be  expected  to  be  used  by  the 
Bayes  procedure,  it  will  be  shown  in  the  next  section  that  this  is  true 
at  least  in  the  case  of  k  =  3  if  the  means  are  equidistant,  i.e.  if 
x^-x2  =  x2-x.j.  At  first,  however,  some  basic  results  concerning  the  Bayes 
procedure  will  be  derived. 

Theorem  2.  At  Stage  1,  given  X  =  x  with  x^  <  x2  < . . . <  xk , 

(13)  ek(x)-€k_^ (x)  <  ek_i (?)-Ek_2(x)  <...<  e2,x)-e'j  (x) . 

Proof:  To  simplify  the  notation,  let  from  now  on  be  p.  =  a(x.-x.), 

J  J  K 

j  =  l,...,k.  Thus  we  have  in  particular  u-|  <  p2  <• .  .<  uk  =  0.  As  in 
the  proof  of  Lemma  6  in  Miescke  (1979),  it  can  be  shown  that  for 
t  =  2,. . . ,k. 
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(14)  £i( x)  =  u.  t+1  +  bE(T(b_1(  max  {^bN-W  t+1))). 

t  -  K-t+i  j>k-t+2  J  -  K~z 1 

From  (14),  e-j(x)  =  0,  and  then  from  the  identity  T(0  =  £+T(-0»  £  €  IR  , 
it  follows  that 


-1, 


(15)  et(x)-E.  ,(x)  =  bE(T(b~'(uk  t+1-  max  { u  - +bN  - } ) ) ) . 
t  -  t-i  -  K-t+i  j  k_t+2  J  J 


Since  T  is  an  increasing  function,  the  assertion  (13)  can  now  be  seen  to 
be  correct. 

In  view  of  the  above  result,  the  screening  procedure  at  Stage  1 
can  be  simplified  as  follows.  First  note  that  for  t  =  2,...,k, 


(16) 


t 

et(x)-c2n2t  =  e2(x)-2c2n2  +  £  Ur(x)-erl (x)-c2n2) . 


Therefore,  at  the  oeginning  one  has  to  compute  e2(x)  -  2c2n2>  Then 
one  has  to  evaluate  and  to  add,  successively,  e.j(x)-£2(x)-c2n2,  e^(x)  - 
e^xJ-t^hg.. . .  as  long  as  these  terms  are  positive.  If  finally  the  total 
sum  on  the  right-hand  side  of  (16)  with  t  =  t*,  turns  out  to  be  positive, 
then  one  selects  irk-t*+l” '  ‘  ,7tk  and  Proceeds  t0  Stage  2.  Otherwise,  one 
stops  and  selects  tt^.  It  should  be  pointed  out  clearly  that  it  may 
happen  that  e2(x)  <  2c2n^  but  nevertheless  the  total  sum  mentioned  before 


is  positive. 

3.  Approximations 

As  noted  in  the  preceding  section,  the  functions  e^(x),  t  =  2,...,k, 
play  a  crucial  role  in  the  determination  of  the  Bayes  procedure.  Therefore, 
we  shall  study  them  now  in  more  detail.  We  shall  also  develop  several 
bounds  which  may  be  used  for  approximations  of  the  procedure.  Throughout 
the  following,  we  assume  that  we  are  at  Stage  1  where  X  =  x  has  been 
observed. 


r  TilrfWiiitMT itfTT?  fr.tttfi  Him 


Because  of  the  permutation  invariance  of  the  procedure  we  can  assume 
without  loss  of  generality  that  Xj  <  x2  < . . . <  x^.  As  before,  for  con¬ 
venience,  let  =  a(xj-xk),  j  =  l,...,k.  We  start  with  the  following 
two  well-known  identities.  For  t  =  l,...,k, 

(17)  c.(x)  =  E(  max  tu-.+bN.}) 

'  j>k-t+l  J  3 

0  _i  «  i 

=  -  /  n  4>(b  (c-u.))dc  +  f[l-  n  4>(b"  U-u,))]dc 

-®  j>k-t+l  J  0  j>k-t+l  3 

and  for  t  *  2, . . . ,k, 

08)  et(x)-c  ,(x)  -  /  n  ♦(b'1(c-ui))[l-*(b“1(e-wk  f+1))]d*. 

1  1  IR  j>k-t+2  J  K't  1 

These  results  have  been  derived  previously  by  Chernoff  and  Yahav  (1977). 
It  should  be  mentioned  that  (17)  and  (14)  as  well  as  (18)  and  (15)  are 
related  to  each  other  through  integration  by  parts. 

Let  us  now  consider  the  special  case  of  k  =  3  populations  where  the 
means  x-j ,  x2,  x^  are  equidistant.  Here  a  simple  expression  for  c^(x) 
can  be  given  using  the  following  result. 

Lemma  1 . 

(19)  E(max{N-j ,  a+Ng,  2a+N3))  ■  2*'T(2‘«a)  +  2“*T(2^a) 

=  E(max{Np  ct+Ng}).  +  2”1E(max(Np  2a+N3>),  a  €  IR  . 

Proof.  By  using  (17),  the  left-hand  side  of  (19)  can  be  seen  to  be 

0  °0 

(20)  H(a)  =  a  -  /  $(x)<t’(x-a)<t>(x+a)dx  +  /[l-t(x)<i>(x-a)«'(x+(i)]dx. 

-oo  0 

Differentiation  with  respect  to  a  and  some  standard  manipulations  lead  to 


Therefore,  the  first  equation  in  (19)  follows  now  by  integration  with 
respect  to  a  and  by  using  (10).  The  second  equation  is  a  consequence  of 
(11). 

The  expression  for  e3(x)  in  the  case  of  equidistant  means  now  is 

(21)  e3(x)  =  22'bT(-2"2b'1A)  +  2"^bT(-22b_1A) 

where  A  =  a(x3~x2)  =  a(x2-x1)  >  0.  On  the  other  hand,  by  (11),  we  have 

(22)  e2(x)  =  2^bT(-2"^b”1 a) . 

Here  we  have  e3(x)-e2(x)  <_  2~^e2(x),  an  inequality  which  does  not  hold 
true  in  general.  Therefore,  the  difficulty  described  at  the  end  of 
the  last  section  cannot  occur.  If  e2(x)  <_  2c2n2,  then  also  e3(x)-e2(x)  <  c2n 
holds  and  thus  the  stopping  rule  depends  only  on  e2(x).  The  optimum  subset 
selection  rule  now  turns  out  to  be  the  following. 

"Select  tt3.  Furthermore,  select  n^,  i  =  1,2,  if  and  only  if 

(23)  xi  >  x3  +  2^a‘1br1(2^b"1c2n2).1' 

This  rule  is  of  the  Gupta  type.  Note  that,  in  view  of  T(0)  =  (2ir)  2, 

-1  -- 

if  c2n2  >2  u  2b  then  (23)  cannot  occur  and  Stage  2  will  not  be  performed 
in  this  case.  We  shall  see  below  that  this  actually  holds  in  the  general 
case  of  k  2  and  ^  <  u2  <. . .<  =  0. 

Returning  to  the  general  case,  a  similar  no-data  check,  before 
entering  Stage  1,  can  be  done  as  follows.  Let  at  =  E(max{N1 ,N2». . . ,Nt } ) , 
t  =  l,...,k.  Various  properties  and  tables  of  the  a^'s  can  be  found  in 
David  (1981).  From  (17)  and  (18)  it  follows  that  for  t  =  2,...,k, 
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Therefore,  if  for  a  certain  t  €  (3,...,k),  at-at  -|  1  b"'c2n2  then  the 
Bayes  procedure  selects  at  most  t-1  populations.  And  since  a3~a2  =  a2~ 

Stage  2  will  never  be  performed  if  c2n2  .>  2~^“^b. 

Next,  several  properties  of  et(x),  t  =  2,...,k,  will  be  described 
below  from  which  bounds  and  approximations  of  the  Bayes  rule  can  be 
derived  later  on. 

Lemma  2.  For  every  t  ^  2,  e^(x)  is  a  strictly  increasing  function  of  b 
and  y  .,  j  =  k-t+1 , . . .  ,k-l . 

J 

Proof:  The  partial  derivative  of  et(x)  with  respect  to  b  in  view  of  (17) 
is 

k  k 

(25)  b"2  l  f  U-y.)  n  4>(b'1(C-y  .))cp(b_1(C-y  .))dC 

i=k-t+l  1R  1  j=k-t+l  J  1 

j/i 

-i  k 

=  b”  (e.(x)-  l  u.Ptu.+bN.  =  max  {y.+bN.))). 

1  ‘  i=k-t+l  1  1  1  jHc-t+l  J  J 

Since  by  (24)  e  (x)  >  0  =  max{y.|j  >  k-t+1),  the  first  assertion  follows. 

^  J 

The  second  one  is  obvious. 

Lemma  3.  For  every  t  2,  e^(x)  considered  as  a  function  of  ( y .... ,1^ 
has  the  following  Taylor  expansion  of  first  order  at  u..+.j  =...=  y^  =  0. 

1  k 

(26)  ef(x)  =  bat  +  t"1  [  y.  +  o(|y.  t  ,|). 

t  t  i=k-t+l  1  K~l 1 

Proof:  For  i  €  tk-t+1 ,. . . ,k-l ),  the  partial  derivative  of  et(x)  with 
respect  to  u.,,  in  view  of  (17),  can  be  seen  to  be  equal  to 

c.(x)  =  P{y,+bN.  *  max  (y.+bN,)) 

3l)i  1  ‘  1  1  j>k-t+l  J  J 


(27) 
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which  at  uk_t+1 
holds. 


0  is  equal  to  t 


-1 


It  is  thus  seen  that  (26) 


Remark.  The  second  order  term  of  the  Taylor  expansion  can  be  shown  to 
be  (2b(t-l ))-1tatvt,  where  vt  denotes  the  ordinary  sample  variance  of 
pk-t+l’' ‘ * ,wk’  However»  since  we  will  hot  make  any  use  of  it  in  the 
sequel,  its  derivation  is  omitted. 

Lemma  4.  For  every  t  _>  2,  et(x)  is  a  Schur-convex  function  of 
^Mk-t+r*‘”wk^’  and  thus  in  particular, 

1  k 

(28)  E.(x)  >  t  1  l  u.+bat. 

c  '  ~  i=k-t+l  1  z 


Proof:  In  (17),  bNkt+i , . . . ,bNk  are  exchangeable  multivariate  normal 

2 

random  variables  with  expectations  0,  variances  b  and  covariances  0. 
The  function 

h(u,,...,u.)  =  max  u.,  u€  IRt, 

1  1  j=l,...,tJ  " 

is  obviously  Schur-convex.  Therefore,  by  Marshall  and  Olkin  (1979), 
Ch.  11  E.9. , 

et(x)  =  E(h(pk_t+i+t>N|(_t+1,...,uk+bNk)) 


is  a  Schur-convex  function  of  (wk_^+i . ,uk)-  Then  the  inequality 
(28)  follows  immediately  from  the  fact  that  (uk_t+i .... majorizes 
1  1^k-t+l+'*  . ^pk )(1,1»... ,1). 

The  results  of  Lemma  (3)  and  (4)  have  interesting  consequences. 
Suppose  we  replace  in  our  Bayes  procedure,  i.e.  in  (8)  and  (9),  for 
t  >  2,  et(x)  by,  say, 

l  k 

ct(x)  «  t  l  p1  +  bat. 

1  '  j-k-t+1  J  1 


(29) 
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In  view  of  (26),  this  can  be  justified  as  a  reasonable  approximation  as 
long  as  | t+-|  I  =  a(xk"xk-t+l^  1S  sma1^*  *n  a^  other  cases,  because  of 
(28),  it  can  be  considered  as  an  approximation  to  the  Bayes  procedure 
which  is  conservative  with  respect  to  costs  of  sampling.  Let  us  take  a 
brief  look  at  this  approximate  procedure.  Since  the  functions  c^(x)  do 
not  have  the  property  of  the  functions  et(x)  given  in  (13),  (16)  and  the 
process  described  after  (16)  is  not  applicable.  Therefore,  let  us  consider 
the  original  form  of  the  Bayes  procedures  as  described  in  Theorem  1, 
with  £j.(x)  replaced  by  e^(x),  t  =  2,...,k.  It  is  of  the  following  form. 

"Select  '  *  ,1Tk  an<1  Proceed  to  Stage  2,  if 

'-1  ^  i 

(30)  t  l  X;  >  xt  -  a  ba;  +  a  c9n«l, 

j=k-t+l  3  K  *  L  c 

-1  k 

where  for  t,  at  T  x*  +  ba*  -  c,n,t,  t  *  2,...,k,  assumes  its 
j=k-t+l  3  z  £  6 

largest  value.  Otherwise,  stop  and  select  n^." 

Here,  the  interesting  feature  is  that  the  averages  of  populations 
associated  with  large  means  are  compared  with  the  maximum  mean  at  Stage 
1.  Considered  as  a  one-stage  subset  selection  procedure,  this  rule  can 
be  seen  also  to  be  an  approximate  Bayes  solution  for  the  one-stage 
subset  selection  problem  under  the  same  distributional  assumptions  as 
before  but  now  with  a  loss  function  of  the  type 

(31)  L(e,s)  *  max  e,  -  fs [  e,  +  y,  ., 

J-l . k  1  J€s  3  |s| 

where  are  appropriately  chosen  constants  which  may  represent, 

for  example,  the  costs  of  using  the  selected  populations  in  the  future. 

In  Chernoff  and  Yahav  (1977),  a  Bayes  single  stage  subset  selection 
procedure  has  been  studied  where  one  of  the  two  components  of  the  loss 
function  Is  equal  to  L(e,s)  -  tisi*  It  should  be  pointed  out  that  a 
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subset  selection  rule  of  the  form  given  in  (30)  has  not  been  considered 
in  the  literature  till  now.  It  would  be  interesting  to  study  its  performance, 
in  a  non-Bayesian  approach,  for  suitably  chosen  constants  replacing 
a"^(c2n2t-bat)  in  (30),  t  =  l,...,k. 

The  approximation  of  our  two-stage  Bayes  procedure  at  Stage  1  con¬ 
sidered  above  can  of  course  also  be  performed  partially,  i.e.  for  specific 
t-values.  In  a  similar  way  other  lower  bounds,  and  upper  bounds  as  well, 
can  be  used  to  approximate  the  optimum  procedure.  For  this  purpose, 
several  bounds  will  be  given  below.  Some  of  them  have  been  derived  already 
in  Miescke  (1979),  but  for  the  sake  of  making  the  present  paper  self- 
contained  they  will  be  included  in  the  list  presented  below,  partly 
accompanied  by  shorter  proofs. 

Lemma  5.  The  following  functions  listed  below  are  lower  bounds  of  et(x), 
t  =  2,. . . ,k. 

(32)  2^bT(-2'ib"1a(xk-xk_1)) 

(33)  et-1(x)  +  bT(-b-1(Et_-|(x)  +  a(xk-xk_t+1 ) ) ) 

(34)  bT(at  -  b'1at(t-l)‘1(x.-t'1  \  x.)) 

t  K  j=k-t+l  J 

l  k 

(35)  bat  +  a(t"  l  x.  -  x.). 

z  j=k-t+l  J  K 

Proof:  From  et(x)  >_  egfx),  (32)  follows  immediately  since  by  (11),  for 
t  =  2,  (32)  is  equal  to  e2(x). 

Since  T(c),  e  €  IR ,  is  a  strictly  convex  function,  Jensen's  inequality 
can  be  applied  to  (14).  This,  together  with  the  Identity  T(0  a  c+T(-c). 

C  €  IR ,  leads  to  (33). 
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To  prove  (34),  consider  the  following  identity  which  is  of  the  same 
type  as  (14). 

(36)  et(x)  =  bE(T(b'1max{u.  +  bN - | k-t+1  <  j  <  k-1})). 

Since  T  is  an  increasing  function,  T(max{Uj,. . .  ,ut_-| }),  u  €  IRt_\  is 
Schur-convex.  By  the  same  argument  as  used  in  the  proof  of  Lemma  4,  a 
lower  bound  on  e^(x)  is  thus  given  by 

l  1  k'1 

(37)  bE(T(max{N.|k-t+l  <  j  <  k-1}  +  b  (t-1 )"  £  p  • ) ) . 

3  j=k-t+l  3 

Therefore,  by  applying  Jensen's  inequality,  it  can  be  seen  that  (34) 
is  a  lower  bound  of  e^.(x).  Note  that  (35)  has  been  derived  in  Lemma  (4). 

In  the  theorem  above,  it  is  only  mentioned  for  the  sake  of  completeness. 

It  should  be  pointed  out  that  (33)  can  be  iterated  and  then  (32), 

(34)  or  (35)  can  be  applied  to  the  result  to  get  further  lower  bounds  for 
et(x),  which  then,  of  course  are  weaker  than  the  previous  ones. 

Useful  upper  bounds  for  et(x)  are  harder  to  find.  Besides  the  obvious 
upper  bound  bat,  the  following  can  be  established  which  can  be  considered 
as  a  counterpart  to  (32). 

Lemma  6.  For  t  =  2 . k, 

x  k-1  , 

(38)  et(x)  <  2*b  l  T(2'*b''a(xrxJ). 

1  '  ~  j=k-t+l  J  K 


Proof:  Consider  the  inequality 

k-1 

max  u^  <  u.  +  l  max{0,uH-u.  }. 
j>k-t+l  3  K  j=k-t+l  J  K 


Applying  it  to  (8),  and  using  (11),  the  result  follows  immediately. 
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